一個Experiment的運行邏輯是:
• Tuner 接收搜索空間,生成configuration。
• 將這些生成的configuration提交到很多訓練平臺上。
• 將各個平台上執行的訓練結果返回給Advisor。
• 繼續生成新的configuration(若需要的話),進行下回合的訓練。
使用者的使用邏輯是:
• 定義搜索空間,按照格式要求編寫YML或JSON檔。 本例採YML。
• 改動原有模型代碼,添加上nni的api。 (只需插入3行 nni開頭的程式碼而已。)
• 定義實驗配置,在config.yml檔中,根據要求,設置好對應的參數要求。
以下的YML部分,大家大略的看過。等到後面章節讓大家安裝、驗證時,會更有感覺。
# Config_detail.yml 範例。將所有超參同時放在一個YML檔案,比較好說明及理解。
# This example shows more configurable fields comparing to the minimal "config.yml"
# You can use "nnictl create --config config_detailed.yml" to launch this experiment.
# If you see an error message saying "port 8080 is used",
# use "nnictl stop --all" to stop previous experiments.
experimentName: MNIST # An optional name to help you distinguish experiments.
# Hyper-parameter search space can either be configured here or in a seperate file.
# "config.yml" shows how to specify a seperate search space file.
# The common schema of search space is documented here:
# https://nni.readthedocs.io/en/stable/Tutorial/SearchSpaceSpec.html
searchSpace:
batch_size:
_type: choice
_value: [16, 32, 64, 128]
hidden_size:
_type: choice
_value: [128, 256, 512, 1024]
lr:
_type: choice
_value: [0.0001, 0.001, 0.01, 0.1]
momentum:
_type: uniform
_value: [0, 1]
trialCommand: python3 mnist.py
# The command to launch a trial. NOTE: change "python3" to "python" if you are using Windows.
trialCodeDirectory: .
# The path of trial code.
# By default it's ".", which means the same directory of this config file.
trialGpuNumber: 1
# How many GPUs should each trial use. CUDA is required when it's greater than zero.
trialConcurrency: 4 # Run 4 trials concurrently.
maxTrialNumber: 10 # Generate at most 10 trials.
maxExperimentDuration: 1h # Stop generating trials after 1 hour.
# Configure the tuning algorithm.
tuner:
name: TPE
# Supported algorithms: TPE, Random, Anneal, Evolution, GridSearch, GPTuner, PBTTuner, etc.
# Full list: https://nni.readthedocs.io/en/latest/Tuner/BuiltinTuner.html
classArgs: # Algorithm specific arguments. See the tuner's doc for details.
optimize_mode: maximize # "minimize" or "maximize"
# Configure the training platform.
# Supported platforms: local, remote, openpai, aml, kubeflow, kubernetes, adl.
trainingService:
platform: local
useActiveGpu: false
# NOTE: Use "true" if you are using an OS with graphical interface
# (e.g. Windows 10, Ubuntu desktop)
# Reason and details:
# https://nni.readthedocs.io/en/latest/reference/experiment_config.html#useactivegpu
模型程式的部分(mnist.py),請見下方說明。只需要注意:
• 主程式 line159,nni開頭的碼。
• def main(args) 函數中,有兩行 nni開頭的碼(line 118 and 123),這是和NNI溝通的碼。
所以十分方便簡潔。
另外,也可留意一下模型本身之參數的定義、和外部參數的合併、參數的叫用等。(下面鏈結內的程式,往後會提到。看不懂可忽略。)
nni/mnist.py at master · microsoft/nni · GitHub
說了好幾天的概念,再不動手真的會睡著。下個章節將動手在本機安裝NNI。